44 research outputs found
On Lasso refitting strategies
A well-know drawback of l_1-penalized estimators is the systematic shrinkage
of the large coefficients towards zero. A simple remedy is to treat Lasso as a
model-selection procedure and to perform a second refitting step on the
selected support. In this work we formalize the notion of refitting and provide
oracle bounds for arbitrary refitting procedures of the Lasso solution. One of
the most widely used refitting techniques which is based on Least-Squares may
bring a problem of interpretability, since the signs of the refitted estimator
might be flipped with respect to the original estimator. This problem arises
from the fact that the Least-Squares refitting considers only the support of
the Lasso solution, avoiding any information about signs or amplitudes. To this
end we define a sign consistent refitting as an arbitrary refitting procedure,
preserving the signs of the first step Lasso solution and provide Oracle
inequalities for such estimators. Finally, we consider special refitting
strategies: Bregman Lasso and Boosted Lasso. Bregman Lasso has a fruitful
property to converge to the Sign-Least-Squares refitting (Least-Squares with
sign constraints), which provides with greater interpretability. We
additionally study the Bregman Lasso refitting in the case of orthogonal
design, providing with simple intuition behind the proposed method. Boosted
Lasso, in contrast, considers information about magnitudes of the first Lasso
step and allows to develop better oracle rates for prediction. Finally, we
conduct an extensive numerical study to show advantages of one approach over
others in different synthetic and semi-real scenarios.Comment: revised versio
Minimax semi-supervised confidence sets for multi-class classification
In this work we study the semi-supervised framework of confidence set classification with controlled expected size in minimax settings. We obtain semi-supervised minimax rates of convergence under the margin assumption and a Hölder condition on the regression function. Besides, we show that if no further assumptions are made, there is no supervised method that outperforms the semi-supervised estimator proposed in this work. We establish that the best achievable rate for any supervised method is n^{−1/2} , even if the margin assumption is extremely favorable. On the contrary, semi-supervised estimators can achieve faster rates of convergence provided that sufficiently many unlabeled samples are available. We additionally perform numerical evaluation of the proposed algorithms empirically confirming our theoretical findings
Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness
We consider contextual bandit problems with knapsacks [CBwK], a problem where
at each round, a scalar reward is obtained and vector-valued costs are
suffered. The learner aims to maximize the cumulative rewards while ensuring
that the cumulative costs are lower than some predetermined cost constraints.
We assume that contexts come from a continuous set, that costs can be signed,
and that the expected reward and cost functions, while unknown, may be
uniformly estimated -- a typical assumption in the literature. In this setting,
total cost constraints had so far to be at least of order , where
is the number of rounds, and were even typically assumed to depend linearly on
. We are however motivated to use CBwK to impose a fairness constraint of
equalized average costs between groups: the budget associated with the
corresponding cost constraints should be as close as possible to the natural
deviations, of order . To that end, we introduce a dual strategy
based on projected-gradient-descent updates, that is able to deal with
total-cost constraints of the order of up to poly-logarithmic terms.
This strategy is more direct and simpler than existing strategies in the
literature. It relies on a careful, adaptive, tuning of the step size
Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm
This work studies minimization problems with zero-order noisy oracle
information under the assumption that the objective function is highly smooth
and possibly satisfies additional properties. We consider two kinds of
zero-order projected gradient descent algorithms, which differ in the form of
the gradient estimator. The first algorithm uses a gradient estimator based on
randomization over the sphere due to Bach and Perchet (2016). We
present an improved analysis of this algorithm on the class of highly smooth
and strongly convex functions studied in the prior work, and we derive rates of
convergence for two more general classes of non-convex functions. Namely, we
consider highly smooth functions satisfying the Polyak-{\L}ojasiewicz condition
and the class of highly smooth functions with no additional property. The
second algorithm is based on randomization over the sphere, and it
extends to the highly smooth setting the algorithm that was recently proposed
for Lipschitz convex functions in Akhavan et al. (2022). We show that, in the
case of noiseless oracle, this novel algorithm enjoys better bounds on bias and
variance than the randomization and the commonly used Gaussian
randomization algorithms, while in the noisy case both and
algorithms benefit from similar improved theoretical guarantees. The
improvements are achieved thanks to a new proof techniques based on Poincar\'e
type inequalities for uniform distributions on the or
spheres. The results are established under weak (almost adversarial)
assumptions on the noise. Moreover, we provide minimax lower bounds proving
optimality or near optimality of the obtained upper bounds in several cases
Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification
International audienceWe study the problem of fair binary classification using the notion of Equal Opportunity. It requires the true positive rate to distribute equally across the sensitive groups. Within this setting we show that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold. We provide a constructive expression for the threshold. This result motivates us to devise a plug-in classification procedure based on both unlabeled and labeled datasets. While the latter is used to learn the output conditional probability, the former is used for calibration. The overall procedure can be computed in polynomial time and it is shown to be statistically consistent both in terms of the classification error and fairness measure. Finally, we present numerical experiments which indicate that our method is often superior or competitive with the state-of-the-art methods on benchmark datasets
Fair Regression via Plug-in Estimator and Recalibration With Statistical Guarantees
International audienceWe study the problem of learning an optimal regression function subject to a fairness constraint. It requires that, conditionally on the sensitive feature, the distribution of the function output remains the same. This constraint naturally extends the notion of demographic parity, often used in classification, to the regression setting. We tackle this problem by leveraging on a proxy-discretized version, for which we derive an explicit expression of the optimal fair predictor. This result naturally suggests a two stage approach, in which we first estimate the (unconstrained) regression function from a set of labeled data and then we recalibrate it with another set of unlabeled data. The recalibration step can be efficiently performed via a smooth optimization. We derive rates of convergence of the proposed estimator to the optimal fair predictor both in terms of the risk and fairness constraint. Finally, we present numerical experiments illustrating that the proposed method is often superior or competitive with state-of-the-art methods